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ABSTRACT 

User experience of video streaming could be greatly im¬ 
proved by selecting a high-yet-sustainable initial video bi¬ 
trate, and it is therefore critical to accurately predict through¬ 
put before a video session starts. Inspired by previous stud¬ 
ies that show similarity among throughput of similar ses¬ 
sions (e.g., those sharing same bottleneck link), we argue 
for a cross-session prediction approach, where throughput 
measured on sessions of different servers and clients is used 
to predict the throughput of a new session. In this paper, we 
study the challenges of cross-session throughput prediction, 
develop an accurate throughput predictor called DDA, and 
evaluate the performance of the predictor with real-world 
datasets. We show that DDA predicts throughput more ac¬ 
curately than simple predictors and conventional machine 
learning algorithms; e.g., DDA’s 80%ile prediction error of 
DDA is > 50% lower than other algorithms. We also show 
that this improved accuracy enables video players to select 
a higher sustainable initial bitrate; e.g., compared to initial 
bitrate without prediction, DDA leads to 4x higher average 
bitrate. 

1 Introduction 

Many Internet applications can benefit from estimating the 
client-server throughput. For instance, accurate estimation 
of throughput helps content distribution networks to redirect 
clients to servers that provide the best performance. Sim¬ 
ilarly, peer-to-peer networks select the best peers based on 
the estimation of their throughput performance. 

Our focus in this paper is on the initial video bitrate se¬ 
lection when a video player starts. A video player should 
ideally pick the highest initial bitrate that is sustainable (i.e., 
below the throughput), in order to ensure desired user expe¬ 
rience of video streaming. Existing approaches to initial bi¬ 
trate selection, however, are inefficient. Table |T] shows mea¬ 
sured anecdotal evidence of such inefficiencies from sev¬ 
eral commercial providers. Fixed-bitrate players that use 
the same bitrate for the whole video session often intention¬ 
ally use low bitrate to prevent mid-stream rebuffering (e.g., 
NFL, Lynda). Even if bitrate can be adapted midstream 
(e.g., If5l [8l [T8ll ) the player often conservatively starts with 
a low bitrate and takes a significant time to reach the opti¬ 
mal bitrate (e.g., Netflix). Furthermore, for short video clips 
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Adaptive 

bitrate 
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Netflix 


Table 1: Limitations of today’s video players 

and how they benefit from throughput prediction. 
www.lynda.com uses fixed bitrate of 520Kbps (360p) by 
default. Netflix (www.netflix.com/WiMovie/701368107tr 
kid=439131) takes roughly 25 seconds to adapt from 
the initial bitrate (560Kbps) to the highest sustainable 
bitrate (3Mbps). 

such adaptation may not reach the desired bitrate before the 
video finishes (e.g., Vevo music clips). 

The importance of initial bitrate selection (e.g., avoid 
users quitting) naturally makes a case for a predictive 
approach - predicting the TCP throughput before a ses¬ 
sion starts. Inspired by prior work on shared measure¬ 
ments l25l , we explore a cross-session approach where the 
TCP throughput of other sessions is used to predict the 
throughput of a new session. Intuitively, we want to build 
a prediction model for each client-server pair as a function 
of key session features available to us (e.g., ISP, connec¬ 
tion type). This has the natural advantage that it incurs no 
additional measurement overhead and leverages all avail¬ 
able sessions even if there is no history between same client 
and server. While this idea is not particularly new, we be¬ 
lieve that revisiting this is timely in light of the need for 
video quality optimization and the availability of large-scale 
throughput measurements to many video service providers!]] 

However, it is challenging to predict throughput accu¬ 
rately based on other sessions’ throughput, because of a 
complex underlying interaction between session features 
and throughput. There are two manifestations of this com¬ 
plexity (see $4] for more details). First, throughput usually 
can only be accurately predicted by combination of multiple 
features. For instance, only sessions from a particular ISP- 
server-device combination may have similarly low through¬ 
put, but the individual ISP, server or device may manifest 

'There has been surprisingly little work in exploring this idea for 
throughput prediction since the early work of OH 
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no problem. Second, the best feature combination to predict 
throughput differs across sessions. For instance, for sessions 
in one ISP, the best feature to predict their throughput is last- 
mile connection (e.g., last hope is the bottleneck), while for 
those in another ISP, the best feature is time of day (e.g., due 
to a strong diurnal pattern). 

To address these challenges, we present DDA (Data- 
Driven Aggregation), which predicts each session’s through¬ 
put by an expressive prediction model that captures tempo¬ 
ral similarity (e.g., sessions happening within a smaller time 
window) and spatial similarity (e.g., sessions matching more 
features with the session under prediction) between previ¬ 
ously observed sessions and the session under prediction. 
Such prediction models allow DDA to predict throughput 
accurately by aggregating sessions with similar throughput. 
Instead of using a single prediction model, DDA customizes 
the prediction model for each session under prediction. To 
pick the prediction model that yields high prediction accu¬ 
racy for each session, DDA adopts a data-driven approach 
and learns the best prediction model by searching for the 
best prediction model to similar existing history sessions. 

In summary, this paper makes three key contributions. 

1. First, we use a dataset of real-world throughput measure¬ 
ment of 9.9 million sessions to show the challenges of 
cross-session throughput prediction ((HJ). 

2. Second, we present a concrete cross-session throughput 
predictor, DDA that addresses the above challenges ( jj5}. 

3. Finally, our evaluation based on two real-world datasets 
shows that DDA can predict throughput more accurately 
than simple predictors and conventional machine learn¬ 
ing algorithms, and that due to more accurate throughput 
prediction, DDA allows a video player to select a higher- 
yet-sustainable initial bitrate (fj6). 

2 Related Work 

At a high-level, our work is related to prior work in measur¬ 
ing Internet path properties, bandwidth measurements, and 
video-specific bitrate selection. With respect to prior mea¬ 
surement work, our key contribution is showing a practical 
data-driven approach for throughput prediction. In terms of 
video, our predictive approach offers a more systematic bi¬ 
trate selection mechanism. 

Measuring path properties: Studies on path properties 
have shown prevalence and persistence of network bottle¬ 
necks (e.g., El), constancy of various network metrics ED, 
longitudinal patterns of cellular performance (e.g., Il22l ). and 
spatial similarity of network performance (e.g., El)- While 
DDA is inspired by these insights, it addresses a key gap 
because these efforts fall short of providing a prescriptive 
algorithm for throughout prediction. 

Bandwidth measurement: Unlike prior “path mapping” 
efforts (e.g., HTTl (l9l [24l l25l ), DDA uses a data-driven 
model based on available session features (e.g., ISP, device). 
Specifically, video measurements are taken within a con¬ 
straint sandbox environment (e.g., browser) that do not of- 



Throughput/Bitrate (Mbps) 

Figure 1: Distribution of throughput in the FCC dataset 

fer interface for path information (e.g., traceroute). Other 
approaches use packet-level probing to estimate the end- 
to-end performance metrics (e.g., EHl6l|23]l26l). Unlike 
DDA, these require additional measurement and often need 
full client/server-side control which is often infeasible in the 
wild. A third class of approaches leverages the history of the 
same client-server pair (e.g., El El |2l][22l|29))- However, 
they are less reliable when the available history of the same 
client and server is sparse. 

Bitrate selection: Choosing high and sustainable bitrate is 
critical to video quality of experience (9). Existing methods 
(e.g., MM) require either history measurement between 
the same client and server or the player to probe the server 
to predict the throughput. In contrast, DDA is able to predict 
throughput before a session starts. Other approaches include 
switching bitrate midstream (e.g., lH5ll28ll30h but do not fo¬ 
cus on the initial bitrate problem which is the focus of DDA. 

3 Datasets 

We use two datasets of HTTP throughput measurement to 
evaluate DDA’s performance: (i) a primary dataset collected 
by FCC’s Measuring Broadband American Platform E) in 
September 2013, and (ii) a supplementary dataset collected 
by a major VoD provider in China. 

FCC dataset: This dataset consists of 9.9 million sessions 
and is collected from 6204 clients in US spanning 17 ISPs. 
In each test, a client set up an HTTP connection with one 
of the web servers for a fixed duration of 30 seconds and at¬ 
tempted to download as much of the payload as possible. It 
also recorded average throughput at 5 second intervals dur¬ 
ing the test. The test used three concurrent TCP connections 
to ensure the line was saturated. Reader may refer to ID for 
more details on the methodology. 

Figure|T]shows the throughput distribution of all sessions. 
It also shows the distribution of ideal bitrate (i.e., highest 
bitrate chosen from (0.016, 0.4, 1.0, 2.5, 5.0, 8.0, 16.0, 
35.0 } Mbp D be Iow the throughput). With perfect throughput 
prediction, we should be able to achieve average bitrate of 
26.9Mbps with no session suffering from re-buffering. Com¬ 
pared to the fixed initial bitrate (e.g., 2.5Mbps) used today, 
this suggests a large room of improvement. 

The clients represent a wide spatial coverage of ISPs, geo¬ 
locations, and connection technology (see Table [2). Al¬ 
though the number of targets are relatively small, the set- 

“The bitrates are recommended upload encoding by YouTube | [2||6| . 
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Feature 

Description 

# of unique 
values 

ClientID 

Unique ID associated to a client 

6204 

ISP 

ISP of client (e.g., AT&T) 

17 

State 

The US state where the client is located 

52 

Technology 

The connection technology (e.g., DSL) 

5 

Target 

The server-side identification 

30 

Downlink 

Advertised download speed of the last 
connection (e.g., 15MB/s) 

36 

Uplink 

Advertised upload speed of the last con¬ 
nection (e.g., 5MB/s) 

25 


Table 2: Basic statistics of the FCC dataset. 

ting is very close to what real-world application providers 
face - the clients are widely distributed while the servers 
are relatively fewer. In addition, its measurement frequency 
(i.e., each client fetching content from each server once ev¬ 
ery hour) provides a unique opportunity to test the prediction 
algorithms’ sensitivity to different measurement frequency. 
For instance, to emulate the effect of reduced data, we take 
one (the first) 5-second throughput sample from each test, 
and then randomly drop (e.g., 90% of) the available mea¬ 
surements to simulate a dataset where each client accesses a 
server less frequently (e.g., in average once every 10 hours). 

Supplementary VoD dataset: As a supplementary dataset, 
we use throughput dataset of 0.8 millions VoD sessions, col¬ 
lected by a major video content provider in China. Each 
video session has the average throughput and a set of fea¬ 
tures, that are different from the FCC dataset, including con¬ 
tent name, user geolocation, user ID and server IP. This pro¬ 
vides an opportunity to test the sensitivity of the algorithms 
to different sets of available features. 

4 Simple Predictors are Not Sufficient 

This section starts by showing that simple predictors fail to 
yield desirable prediction accuracy, and then shows funda¬ 
mental challenges of cross-session throughput prediction. 

• First, we consider the last-mile predictor, which uses 
sessions with the same downlink feature (see definition 
in Table |3) to predict a new session’s throughput. This 
is consistent to the conventional belief that last-mile con¬ 
nection is usually the bottleneck. However, Figure |2(a)| 
and |2(b)| show substantial prediction erroid especially 
on the tail where at least 20% of sessions have more 
than 20% error (Figure |2(b)| >. To put it into perspec¬ 
tive, if a player chooses bitrate based on throughput pre¬ 
diction that is 20% higher or lower than the actual, the 
video session will experience mid-stream re-buffering or 
under-utilize the connection. Finally, Figure |2(c)| and 
|2(d)| show that the prediction error is two-sided, suggest¬ 
ing that simply adding or multiplying the prediction by a 
constant factor will not fix the high prediction error. 

3 Given throughput prediction p and actual throughput q, we de¬ 
fine four types of prediction error: non-normalized absolute pre¬ 
diction error: |p — q\, normalized absolute prediction error: I p ~ q ' , 
non-normalized signed prediction error: p — q, normalized signed 
prediction error: . 



Figure 2: Prediction error of the last-mile predictor 

* Second, we consider the last-sample predictor, which 
uses the throughput of the last session of the same client- 
target pair to predict the throughput of a future session. 
However, the last-sample predictor is not reliable as the 
last sample is too sparse and noisy to offer reliable and 
accurate prediction. Figure [3] shows that, similar to the 
last-mile predictor, (i) the prediction error, especially on 
the tail, is not desirable - more than 25% of sessions have 
more than 20% normalized prediction error, and (ii) the 
prediction error is two-sided, suggesting the prediction 
error cannot be fixed by simply adding or multiplying 
thip prediction with a constant factor. 



0 0.5 1 1.5 

Error (MB/s) 

(a) ^Non-normalized absolute 


0 0.2 0.4 0.6 0.8 1 

Error (%) 

(Iji) Normalized absolute 



-1 0 1 
Error (MB/s) 

(c) Non-normalized signed 


-0.5 0 0.5 

Error (%) 

(d) Normalized signed 


Figure 3: Prediction error of last-sample predictor 
Challenges: The fundamental challenge to produce accu¬ 
rate prediction is the complex underlying interactions be¬ 
tween session features and their throughput. In particular, 
there are two manifestations of such high complexity. 

First, the simple predictors are both based on single fea¬ 
ture (e.g., downlink or time), while combinations of multi¬ 
ple features often have a much greater impact on through¬ 
put than individual features. This can be intuitively ex¬ 
plained as the throughput is often simultaneously affected 
by multiple factors (e.g., the last-mile connection, server 
load, backbone network congestion, etc), and that means 
sessions sharing individual features may not have similar 
throughput. Figure |4(a)| gives an example of the effect of 
feature combinations. It shows the average throughput of 
sessions of ISP Frontier using DSF fetching target sam- 
knowsl.lax9.level3.net, and average throughput of sessions 
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(a) The average throughput of sessions matching 
all and a subset of three features: ISP = Fron¬ 
tier, Technology = DSL and Target = sam- 
knowsl.lax9.level3.net (X). Time: 18:00-00:00 UTC, 



(b) The relative information gain of Target in two ISPs 
over time. 


Figure 4: Two manifestations of the high complex inter¬ 
action between session features and the throughput. 


having same values on one or two of the three features. The 
average throughput when all three features are specified is 
at least 50% lower than any of other cases. Thus, to capture 
such effect, the prediction algorithm must be expressive to 
combine multiple features. 

Second, the simple predictors both use same feature to all 
sessions, but the impact of same features on different ses¬ 
sions could be different. For instance, throughput is more 
sensitive to last-mile connection when it is unstable (e.g., 
Satellite), and it depends more to ISP during peak hours 
when the network tends to be the bottlenecks. Figure |4(b)| 
shows a real-world example. Relative information gaiiyis 
often used to quantify how useful a feature is used for pre¬ 
diction. The figure shows the relative information gain of 
feature Target on the throughput of sessions in two ISPs 
over time. It shows that the impact of the same feature varies 
across sessions in different hours and in different ISPs. 

We will see in that due to the complex underlying in¬ 
teractions between features and throughput, it is non-trivial 
for conventional machine learning algorithms (e.g., decision 
tree, naive bayes) to yield high accuracy. 

5 Predicting Throughput Using DDA 

In this section, we present the DDA approach that yields 
accurate throughput prediction (fj6|. We start with an in¬ 
tuitive description of DDA before formally describing the 
algorithm. 

4 RIG{Y\X) = 1 - H(Y\X)/H(Y), where H(Y) and H(Y\X) 
are the entropy of Y and the average conditional entropy of Y II- 


Sessions under prediction 



Figure 5: Mapping between sessions under prediction 
and prediction models. 

5.1 Insight of DDA 

At a high level, DDA finds for any session s a prediction 
model - a pair of features and time range, which is used to 
aggregate history sessions that match the specific features 
with s and happened in the specific range. 

To motivate how DDA maps a session to a prediction 
model, let us consider two strawmen of session-model map¬ 
ping shown in Figure 0 The first strawman maps each ses¬ 
sion s to the “Nearest Neighbor” prediction model (dash ar¬ 
rows), which aggregates only history sessions matching all 
features with s and happening in very short time (e.g., 5 
minute) before s. Theoretically, “Nearest Neighbor” model 
should be highly accurate as it represents sessions that are 
the most similar to s, but history sessions meeting this re¬ 
quirement are too sparse to provide reliable prediction. Al¬ 
ternatively, one can map any s to the “Global” prediction 
model (dot arrows), which aggregates all history sessions re¬ 
gardless of their features or happening time. While “Global” 
model is highly reliable as it has substantial samples in his¬ 
tory, the accuracy is low because it does not capture the ef¬ 
fect of feature combination introduced in the last section. 

Ideally, we would like achieve both high accuracy and 
high reliability. To this end, DDA (shown by solid arrows in 
Figure O differs from the above strawmen in two important 
aspects. First, DDA finds for a given session a prediction 
model between the Nearest Neighbor and Global prediction 
models, so that it strikes a balance between being closer to 
Nearest Neighbor for accuracy and being closer to Global for 
reliability. The resulting prediction model should be expres¬ 
sive (e.g., have more features) and yet have enough samples 
to offer a reliable prediction. Second, instead of mapping all 
sessions to the same prediction model, DDA maps different 
sessions to different prediction models, which allows DDA 
to address inherent heterogeneity that the same feature has 
different impact on different sessions. 

5.2 Design of DDA 

Overall workflow: DDA uses two steps to predict the 
throughput of a new session s. 

1. First, DDA learns a prediction model M* based on his¬ 
tory data. A prediction model is a pair of feature combi¬ 
nation and time range. 

2. Second, DDA estimates s’s throughput by the median 
throughput of sessions in Agg(M*, s) that match s on 
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the feaures of M* and are in the time range of M*. I.e., 
DDA’s prediction is Pred(s) = Median(Agg(M*, s)). 

Learning of prediction model: First, DDA learns a pre¬ 
diction model M* based on history data from a pool of all 
possible prediction models, i.e., pairs of all feature combi¬ 
nations (i.e., 2" subsets of n features in Table O and possi¬ 
ble time windows. Specifically, the possible time windows 
include time windows of certain history length (i.e., last 10 
minutes to last 10 hours) and those of same time of day/week 
(i.e., same hour of day in the last 1-7 days or same hour of 
week in the last 1-3 weeks). 

The objective of M* is to minimize the prediction error, 
Err{Pred(s), s w ) = l- FW ( s )~ s ’"l ; where s w is the actual 
throughput of s. That is, 


M* = argmin Err(Median(Agg(M, s)), s w ) (1) 

M 

Rather than solving Eq |T] analytically, DDA takes a data- 
driven approach and finds the best prediction model over a 
set of history sessions Est(s) (defined shortly). Formally, 
the process can be written as following: 


M* = arg min 

M 


1 

|£st(s)| 


Err{Median(Agg(M, s')), 

s' €.Est(s) 


Sw) 

( 2 ) 


Est(s) should include sessions that are likely to share the 
best prediction model with s. In DDA, Est(s) consists of 
sessions that match features Target, ISP, Technology and 
Downlink with s and happened within 4 hours before s. 


Estimating throughput: Second, DDA estimates s’s 
throughput by the learned prediction model M*. To make 
the prediction Pred(s) reliable, DDA ensures that Pred(s) 
is based on a substential amount of sessions in Agg(M*, s). 
Therefore, if M* yields Agg(M*, s) with less than 20 ses¬ 
sions, DDA will remove that model from the pool and learn 
the prediction model as in the first step again. We have also 
found that for some pairs of client and server, DDA’s pre¬ 
diction error is one-sided. For instance, the throughput of a 
particular client-server pair is 1Mbps, while the best predic¬ 
tion model always predicts 2Mbps (i.e., a one-sided 100% 
error). We compensate this error by changing Median (S) to 
Median{S, k) which reports the median of throughput in S 
times a factor k. To train a proper value of k, DDA first uses 
Eq[2]to learn M* by assuming k = 1, and then, DDA trains 
the best factor k* for s as follows: 


fc: = arg fe min p^)| 


Err(Median(Agg(My, s'), k), s' w ) 

s' (£Est(s) 


where k is chosen from 0 to 5. Finally, the prediction made 

by DDA will be Median(Agg(M*, s), k*). 


6 Evaluation 

This section evaluates the prediction accuracy of DDA (' 16. 1 1 
and how much DDA improves video bitrate ( )6.2l i. Overall, 
our findings show the following: 

1. DDA can predict more accurately than other predictors. 

2. With higher accuracy, DDA can select better bitrate. 


6.1 Prediction accuracy 

Methodology: As points of comparison, we use implemen¬ 
tations of Decision Tree (DT) and Naive Bayes (NB) with 
default configurations in weka, a popular ML tool Q. For 
a fair comparison, all algorithms use the same set of fea¬ 
tures. We also compare them with last-mile predictor (LMjf| 
and last-sample predictor (LS), introduced in 0 We update 
the model of other algorithms in a same way as DDA: for 
each session under prediction, we use all available history 
data before it as the train data. Each session’s timestamp is 
grouped into 10-minute intervals and used as discrete time 
feature. By default, we use absolute normalized error (fj5j 
as the metric of prediction error, and the results are based on 
the FCC dataset, unless specified otherwise. 

Distribution of prediction error: Figure [6] shows the dis¬ 
tribution of prediction error of DDA and other algorithms. 
DDA outperforms all algorithms, especially on the tail of 
prediction error. For the FCC dataset (Figure |6(ajl >, 80%ile 
prediction error of DDA is 50% to 80% lower than that 
of other algorithms, and DDA has less than 20% sessions 
with more than 10% prediction error, while all other algo¬ 
rithms have at least 30% session with more than 10% error. 
While the VoD dataset in general has higher prediction error 
than the FCC dataset (due to the lack of some features such 
as last-connection and longitudinal information), DDA still 
outperforms other algorithms, showing that DDA is robust 
to the available features. 



(a) FCC (b) VoD China 

Figure 6: CDF of prediction error. 


Dissecting prediction accuracy of DDA: To evaluate the 
prediction accuracy in more details, we first partition the pre¬ 
diction error by four most popular ISPs ( |7(a)| ) and by differ¬ 
ent time of day ( |7(b)[ ). Although the ranking of algorithms 
varies across different partitions, DDA consistently outper¬ 
forms other two algorithms (DT, NB), especially in the tail 
of 90%ile. Finally, Figure [7(c)] evaluates DDA’s sensitivity 
to measurement frequency by comparing the distribution of 
prediction error of three algorithms under different random 
drop rates (lj3J- It shows that DDA is more robust to mea¬ 
surement frequency than the other algorithms. 

6.2 Improvement of bitrate selection 

Methodology: To evaluate bitrate selected based on some 
prediction algorithm, we consider a simple bitrate selection 

5 LM is not applicable to the VoD dataset as it has no feature related 
to last-mile connection. 
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(a) Prediction error vs. ISP 
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(b) Prediction error vs. time of day (UTC) 
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Figure 7: Dissecting DDA prediction error. The boxes show the 10-20-50-80-90 percentile. 


algorithm (while a more complex algorithm is possible, it is 
not the focus of this paper): given a session of which the 
prediction algorithm predicts the throughput by w, the bi- 
trate selection algorithm simply picks highest bitrate from 
{0.016,0.4,1.0,2.5, 5.0, 8.0,16.0, 35.0}Mbps EE) and be¬ 
low aw, where a represents the safety margin (e.g., higher a 
means higher bitrate at the risk of exceeding the throughput). 
We use two metrics to evaluate the performance: (1) AvgBi- 
trate - average value of picked bitrate, and (2) GoodRatio - 
percentage of sessions with no re-buffering (i.e., picked bi¬ 
trate is lower than the throughput). Therefore, one bitrate se¬ 
lection algorithm is better than another if it has both higher 
AvgBitrate and higher GoodRatio. As points of reference, 
“Global” bitrate selection algorithm picks the same bitrate 
for any session, which represents how today’s players select 
starting bitrate. As a optimal reference point, “Ideal” bitrate 
selection algorithm picks the bitrate identical to the through¬ 
put for any session (fj3). 

Overall improvement: Table [3] compares DDA-based bi¬ 
trate selection and the “Global”. In both algorithms, we use 
a = 0.8 for the FCC dataset, and a = 0.6 for the VoD 
dataset. In both datasets, DDA leads to higher AvgBitrate 
and GoodRatio, and DDA is much closer to “Ideal” than 
“Global”. Note that the VoD dataset still has a substantial 
room of improvement due to the relatively low prediction 
accuracy (Figure |6(b)l>. 



FCC 

VoD China 


AvgBitrate 

GoodRatio 

AvgBitrate 

GoodRatio 

Global 

2.5Mbps 

88.2% 

2.5Mbps 

77.5% 

DDA 

13.3Mbps 

99.5% 

2.7Mbps 

88.2% 

Ideal 

27.2Mbps 

100% 

3.5Mbps 

100% 


Table 3: Comparing DDA and “Global” in AvgBitrate 
and GoodRatio. 


Bitrate selection vs. prediction accuracy: Next, we ex¬ 
amine the intuition that higher prediction accuracy leads to 
higher performance of bitrate selection. Table Q] shows the 
bitrate selection performance as a function of median pre¬ 
diction error. We consider four prediction algorithms (DDA, 
DT, LS, NB). For a fair comparison, the bitrate selection al¬ 
gorithm always uses a = 0.8. As prediction error increases, 
the performance of bitrate selection degrades in terms of 
both lower AvgBitrate and lower GoodRatio. 

Understanding bitrate improvement: There is a natural 
tradeoff between AvgBitrate and GoodRatio (e.g., higher a 



Mean/median 
prediction error 

AvgBitrate 

GoodRatio 

DDA 

9.0%/2.3% 

13.3Mbps 

99.5% 

DT 

23.1%/3.4% 

13.0Mbps 

91.0% 

LS 

28.7%/9.8% 

12.3Mbps 

90.6% 

NB 

91.4%/17.1% 

12.2Mbps 

71.8% 


Table 4: Higher accuracy means better bitrate selection. 

means higher AvgBitrate at the cost of lower GoodRatio). 
Figure |8(a)| shows such tradeoff of various bitrate selection 
algorithms by adjusting the value a. It is shown that DDA- 
based bitrate selection strikes a better tradeoff of higher Avg¬ 
Bitrate and higher GoodRatio (i.e., more towards the top- 
right corner of the figure). 

Finally, we would like to test the robustness of DDA- 
based bitrate selection in different regions. Figure [8(b)| com- 
pares the AvgBitrate of DDA with “Global” and “Ideal” in 
four popular ISPs. DDA uses the maximum a on the trade¬ 
off curve in Figure |8(a)| that ensures at least 95% GoodRatio, 
while “Global” only has GoodRatio of 88.2%. Across all 
ISPs, DDA consistently outperforms “Global” and achieve 
at least 60% of the “Ideal”. 



(a) AvgBitrate-GoodRatio tradeoff (b) Performance by ISP 


Figure 8: In-depth analysis of bitrate selection 
7 Conclusion 

Many Internet applications can benefit from estimating end- 
to-end throughput. This paper focuses on its application 
to initial video bitrate selection. We present DDA, which 
leverages the throughput measured by different clients and 
servers to achieve accurate throughput prediction before a 
new session starts. Evaluation based on two real-world 
datasets shows (i) DDA predicts throughput more accurately 
than simple predictors and conventional machine learning al¬ 
gorithms, and (ii) with more accurate throughput prediction, 
a player can choose a higher-yet-sustainable bitrate (e.g., 
compared to initial bitrate without prediction, DDA leads to 
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4x higher average bitrate with less sessions using bitrate ex¬ 
ceeding the throughput). 
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